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Abstract 

An extension to the P 3 M algorithm for electrostatic interactions is presented, that allows to 
efficiently compute dipolar interactions in periodic boundary conditions. Theoretical estimates for 
the root-mean square error of the forces, torques and the energy are derived. The applicability 
of the estimates is tested and confirmed in several numerical examples. A comparison of the 
computational performance of the new algorithm to a standard dipolar Ewald summation methods 
shows a performance crossover from the Ewald method to the dipolar P 3 M method for as few as 
300 dipolar particles. In larger systems, the new algorithm represents a substantial improvement 
in performance with respect to the dipolar standard Ewald method. Finally, a test comparing 
point-dipole based and charged-pair based models shows that point-dipole based models exhibit a 
better performance than charged-pair based models. 
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I. INTRODUCTION 



Dipolar interactions are important in many soft-matter systems ranging from dispersions 
of magnetic micro- and nanoparticles (ferrofluids) and electrorheological fluids to magnetic 
thin films and water^EEEEl. Numerical simulations play a central role in explaining and 
unravelling the rich variety of new and unexpected behavior found in recent theoretical 
and experimental studies on dipolar systems^. Especially for systems which possess point- 
dipolar interactions such as dipolar model systems used in analytical theories, or ferrofluids, 
a numerical algorithm based on truly dipolar interactions is needecP^. Periodic boundary 
conditions are frequently used in these simulations in order to approach bulk systems within 
the limits of currently available computers (see refP for a detailed discussion about the 
adequacy of such methods to describe electrostatic systems). If a system of N particles with 
positions in a cubic box of length L that carry point dipoles {^j}*^^ is considered, 

then the total electrostatic energy under periodic boundary conditions is given, in Gaussian 
units, by 

1 N N 

U = 2 Yl Yl v ( r *i + nL > Pi) W 



where = r« — rj, and 

v(rij, nu l*j) = (Mi ■ V n ) (fij ■ V rj .) T—r 

I r ij I 

Mi • _ 3 (mj • fjj) (Mj " r ij) /o\ 

_ lr>..|3 " It>.-|5 v > 

I ' ij I I ' ij I 

is the dipolar pair interaction for point dipoles. The innermost sum runs over all periodic 
images of the system, identified by the shifting integer vector n. The prime in the sum in 
eq. indicates that the i = j term must be omitted for n = 0. Note that the dipolar 
sum is conditionally converging^ and its precise value depends on the summation order. 
In what follows we assume eq. Q to be summed over spherical shells (spherical order of 
summation ) 10 * n l 

The force F iy and the electrostatic field Ei acting on a particle % can be obtained by 
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differentiating the potential energy U with respect to and Hi respectively, i.e., 

Ft = -V r XJ (3) 
Ei = -V W 17. (4) 

In the case of dipoles, these quantities are related via F(r) = V r (/i. • E(r)). For point- 
dipoles, the torque Tj acting on particle i can be related to the electrostatic field at the 
position of the particle as 

Ti = fii x Ei (5) 

Performing the direct summation of the interactions (eq. (II])) is impracticable beyond a 
few particles due to the slow convergence of the innermost sum and the quadratic scaling 
with the total number N of particles in the outer sums. However, algorithms have been 
proposed to speed up the computation of dipolar interactions: the dipolar Ewald su m 1 12 1 13 1 , 
the dipolar Lekner surrP, the (Smooth) Particle-Mesh Ewald methods: PME and SPMEp^, 
and Multipole Methods (MM): Fast-MM, and Cell-MM 2211312113 . For a general overview of 
these algorithms, see the reviews in refsP^. 

Although the Ewald summation is significantly better than direct summation from a 
computational point of view, it still exhibits an unfavourable 0(N 3 ^ 2 ) scaling with the 
number of particles®'. By contrast, Multipole methods scale linearly, but have a large 
prefactor in the O(N) scaling. In the case of point charges, Multipole methods have been 
found to be superior to mesh methods only for very large systems N > 100000 (see discussion 
in refP^ ancP^). For systems of moderate size, optimal algorithms are those that take 
advantage of the Fast Fourier Transform (FFT) in order to compute the Fourier contribution 
to the Ewald sum, which are commonly known as particle mesh methods: PME, SPME, and 
Particle-Particle-Particle Mesh (P 3 M), which is introduced in this article. These methods 
are all 0(N log N), i.e. they exhibit a nearly linear scaling with the number of particles. 

When computing Coulomb interactions, the P 3 M method^ achieves the highest accuracy 
among the particle mesh methods, thanks to its use of the optimal lattice Green function that 
is designed to minimize root-mean-square (rms) errors^^. The PME and SPME algorithms 
have already been generalized to compute dipolar interactions^. In this paper we perform 
a similar generalization, but for the P 3 M algorithm. An advantage of the P 3 M approach 
is that it provides theoretical estimates for the rms accuracy of the forces, torques and 



energy as by-products. These estimates give valuable information about the accuracy of the 
algorithm without having to perform tedious benchmarking, and they allow for the tuning 
of the algorithm to yield minimal computing time at a given level of accuracy. No such 
theoretical error estimates are currently known for the dipolar PME nor SPME methods. 

To verify the applicability and correctness of the method presented in this article and 
to be able to perform the numerical tests, the method was implemented in the simulation 
package ESPResScP^, and it will be contained in one of the coming releases of the software. 

The outline of this paper is as follows. The basic formulas for the Ewald summation of 



dipolar interactions are recalled in Set. II A In Set. II B, Hockney and Eastwoods's P M 



algorithm is extended to compute dipole-dipole interactions. A correction term that must 
be applied to any dipolar energy when computed via particle-mesh-methods is derived in 



Set. II C Theoretical estimates for the rms error of forces, torques, and energy as computed 



by P 3 M are presented in Set. III. Numerical tests of the accuracy of the error estimates 
are made in Set. |IV| In Set. [V] several issues related to the computational efficiency of the 
method are discussed: performance of the method when compared to the traditional dipolar 
Ewald sums, suitable approaches to make a fast implementation of the method in constant- 
pressure simulations, and a comparison of the efficiency of dipole-based and charge-based 
models to mimic dipolar systems. Technical details for building up the P 3 M dipolar method 
are given in App. [A] while App. [B] derives and discusses the rms error estimates. 

II. THE DIPOLAR P 3 M METHOD 

In this section the dipolar P 3 M algorithm is presented by first recalling the basics of 



the dipolar Ewald summation in which the new method has its roots (see Set. II A). The 



details of the new algorithm are presented in Set. II B The effect of discretization errors 



in Madelung-Self interactions (those of a particle with its periodic images and itself) is 



discussed and a correction term to remove a bias in the energy is derived in Set. |II C 
The different Fourier Transforms as well as the domains to which they apply are defined in 
Table I. In the following, we assume a cubic box, but the generalization to triclinic boxes is 



straightforward, see for instance ref.^for an implementation in PME and SPME algorithms. 
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A. Ewald summation with dipolar interactions 



The fundamental idea of the Ewald summation (and its advanced implementations like 
the particle mesh methods PME, SPME and P 3 M) is to calculate energies, forces, and 
torques by splitting the long-ranged dipolar pair-interaction into two parts, 

v(r,fii,fij) = (ft ■ V r J (nj ■ V rj ) (^Oij) + <f>{rn)), (6) 

where ip(r) contains the short-distance part of the Coulomb interaction, and 4>{r) contains 
its long-distance part (4>(r) must moreover be smooth everywhere and regular at the origin). 
The standard way to perform this splitting is to set 
. , . erfc(ar) 

) = ^—L, r=\r\, (7) 

r 

,. . erf(ar) . . 

0(r) = -±-L, (8) 

though other choices are possible^ 28 * 29 * 30 * 31 !. The inverse length a, which is often referred to 
as the Ewald (or splitting) parameter, weighs the importance of one term with respect to 
the other, and can be chosen so as to optimize the performance. The interactions associated 
to the function ip are short-ranged and they can hence efficiently be summed numerically. 
The interactions associated to the function cf) are long-ranged in real space, but short-ranged 
in the reciprocal Fourier space, and can therefore be efficiently computed in that latter 
space. The decomposition of the potential leads to the well-known Ewald formula for the 
electrostatic energy of a system of dipoles (see details in ref J 6 * 10 * 11 -^) 

U = U {r) + U {k) + f/ (sclf) + t/ (surf) (9) 

where the real-space energy U^ r \ the reciprocal-space energy U^ k \ the self-energy 

[/(^lf) and 

the surface f/( surf ) contributions are 
1 - 

Uir) ^EE^' V n) ■ V.,) V>(ry) (10) 
i,j=i «ez3 

uik) = • ik \ 2 i(k) (11) 

fc^O 
fe6K 3 

o 3 N 

^ (self) = -|^£^ ( 12 ) 



i=l 

TV N 



V I i=l j=l 
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where V = L 3 is the volume of the box, and e' is the dielectric constant of the medium 
surrounding the replica boxes: e' = 1 for vacuum, and e' = oo for metallic boundary 
conditions. Because of the periodic boundary conditions, wave vectors feeK 3 are discrete 



where K 3 = {2ttti/L : n e Z 3 }. In Eq. (11), p{k) is the Fourier transform of the periodic 
dipole density 

TV 

p(r) = ^/M(r-r i ), r E V, (14) 

i=i 

which reads, 

N 



In (11), the Fourier transform 4>(k) of the reciprocal interaction (ISJ) is 



0(fe) = J <J)(r)e- ik - r dr = ^e~ k2/4a \ (16) 

The term £/( self ) subtracts the unwanted self-energies that are included in the reciprocal 
energy U^ k \ where the self-energy of a dipole is defined as the reciprocal interaction of the 
dipole with itself: lim T ,^ (— |) (/z, • V r ) 2 0(r). It should be remarked that the expression 



given in eq. (13) for the surface term is valid only when a spherical order of summation is 
used in the calculation of the direct su m 110 1 11 1 , eq. 0. In that case, eqs. Q and eq. @ lead 
to identical values, provided that the interaction energy of the dipoles with the surrounding 
medium of dielectric constant e' at infinity is added to eq. Q (e' = 1 was assumed when 
writing Q). Notice that the surface term vanishes if metallic boundary conditions (e' = oo) 
are used. 

Ewald expressions for the force and electric field acting on a dipole i follow from eqs. 0, 
and @: 

Fi = FW + F™ (17) 
Ei = E\ r) + Ef ) + ^ (self) + ^ (surf) . (18) 

where the superscripts (r) and (&) denote the real-space and reciprocal-space contributions. 
Notice that there is no self- nor surface-contribution to the force because the self- and 



surface-energy terms (eqs. (12) and (13)) are independent of the particle positions. By (p 



the torque on dipole i follows directly from the electric field: Tj = Pi x Ei. The reader is 



referred to refP^ for fully explicit Ewald formulas for the real space and reciprocal space 
contributions to the force and torque. For further reference, it is worth noting that the 
reciprocal space contributions to the force and electrostatic field can be written as 



E 



FT -1 

FT fcio 



FT -1 



ik (p(Jfe) ■ ik) 4>(k) 



ik(ni ■ ik) (p(k) ■ ik) (j)(k) 
/V FTj^o [JB« ik] + fi ity FT^ 
t^i,x FT^q E^ ik x + FT fc _^ 



ik 

E {k) ik, 



+ Hi,z FT fc ^ 



Ei k) ik 



(19) 



(20) 



i; (fc) ik, 



where the inverse Fourier series FT _1 [- ■ ■ ] is defined in Table I (the k = term must be 
excluded in the back-transformation), and the components of the Fourier transform of the 
electrostatic field are E^ = (Ex , Ey, Eg), and p,i = (jj ijX , ^ y , fi i)Z ) is the dipole moment 
of particle i. The last equality for the force arises from the fact that F(r) = V r (p- E(r)) = 
(p ■ V r )E(r) in electrostatics (VxE = 0). 

From a computational point of view, the Ewald method requires therefore to first Fourier 
transform the dipole distribution to the reciprocal space, then to solve the Poisson equation 
in reciprocal space [which reduces to a simple multiplication by 0(fe)], and finally to Fourier- 
back-transform the results to real space. 



B. Algorithmic details of the mesh calculations 

What distinguishes the particle mesh methods from the Ewald summation is that, while 
Ewald summation uses the standard Fourier series to compute the reciprocal space contribu- 
tion, particle mesh methods use Fast Fourier Transformations (FFT), thereby reducing the 
computational effort from 0(N 3 ^ 2 ) to 0(N log N). However, since FFT is a mesh transfor- 
mation, it is necessary to: (1) Map the dipole moments from continuous positions onto lattice 
points (which will be referred to as dipole assignment to the mesh sites); (2) Fast-Fourier 
transform the mesh and solve the Poisson equation on the (reciprocal) mesh; (3) Fourier 
transform the mesh back to real-space, and interpolate the results onto the continuous dipole 
positions. 

The computation of the real-space contribution U^ r ' in the Ewald formula is kept un- 
changed, and the reader is referred tcP^for explicit formulas. In the following, we discuss in 
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detail the mesh calculation in the case where the zfc-differentiation scheme is used. Other 
differentiation schemes can be easily implemented, see^l for details. 

The mesh is assumed to be a cubic FFT mesh with the lattice spacing given by h = L/Nm, 
where Nm stands for the number of mesh points in each direction. We denote by M 3 the 
set of all points belonging to the mesh: M 3 = {nh : n G Z 3 ,0 < n Xt y tX < Nm}- An 
index ! M' is attached to any quantity defined at mesh points only, e.g. the mesh-based 
dipole density pM( r m) or the mesh-based electric field E M (r m ), r m G M 3 . The inverse fast 
Fourier transform FFT _1 [/] corresponds to a truncated Fourier series over wave vectors in 
one Brillouin zone (see Table I ). By convention, this zone is taken to be the set of wave 
vectors M 3 = {2irn/L : n G Z 3 , |n Xj?;j2 | < Nm/2}, which we call the "reciprocal mesh" or 
first Brillouin zone. The number of mesh points per direction Nm should preferably be a 
power of two, because in that case the FFTs are computed more efficiently Notice that with 
this definition, the reciprocal mesh is symmetric: if wave vector k belongs to the mesh, so 
does —k. 



1. Dipole assignment 

The dipole density PM{fm) on the mesh is determined from the A" dipolar particles 
{{vi. Hi)} by the assignment function W(v) that maps the particles from their continuous 
positions to the mesh, 

1 N 

pM{r m ) = 73 ^2 ViW(r m - n), (21) 

i=l 
m.i. c. 

where minimum image convention (m.i.c.) is used when computing relative distances r m — 
tv We use the same assignment functions W(r) as defined by Hockney and Eastwood 
in the original P 3 M method for Coulomb interactions^, which are (shifted) B-splines and 
are tabulated in ref.^. The assignment functions are classified according to the number P 
of nearest grid points per coordinate direction over which the dipole is distributed. The 
quantity P is referred to as the assignment order parameter. A formal expression for Hockney 
and Eastwood's assignment functions is W^ p \r) = W^ p \x)W <yP \y)W ( ^ p \z) where 

«^W=(x[T.5l*-*x[^,i])(=) P2) 

S v ' 

P— fold— convolution 
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and x[~T7 1] i s the characteristic function, i.e., the function that is 1 within this interval and 
outside. 



2. Solving the Poisson equation 



The reciprocal electrostatic energy, and electrostatic field are computed at each mesh 



point r m by approximating equations (11), and (19) by 



U M=^Y,\pM{k)-D{k)^G{k) 



E {k) (r ) 



fegM 3 
fc^O 

= FFT 



fc^O 



E 



(fc) 

M 



FFT fc ^ 



D{k)(p M {k)-D{k))G{k) 



(23) 



(24) 



Here, Pu{k) is the fast Fourier transform of the dipole density Pm{ t ) on the mesh. The 
k = term is excluded in the inverse transform FFT -1 of all mesh-based quantities as in 



reciprocal Ewald terms eqs. (11), (19), and (20). The function 
D(k) =ik, fee M 3 , 



(25) 



is the Fourier expression of the gradient operator on the reciprocal mesh. G(k) is the lattice 
Green function, also known as the influence function, and it is defined below at the end of 



Sec. B [see eq. (30)]. It should be remarked that both D(k) and G(k) are periodic in K 3 , 
with the period given by the first Brillouin cell M 3 , i.e., period 2ir/h. 



Note that eqs. (23) to (24) correspond to the reciprocal Ewald formulas recalled in 



Set. II A but are modified in two ways: the FT of the dipole density is replaced by a 
FFT of the mesh dipole density and the (continuous) reciprocal interaction (j>(k) is replaced 
by a discrete lattice Green function G(k). A fundamental idea in the P 3 M method is that 
the lattice Green function is not simply taken as the continuum Green function (p(k), but 
it is considered as an adjustable function whose form is determined by the condition that 
the mesh based calculation gives results as close as possible, in a least-square sense, to the 



results of the original continuum problem (see below Sec. II B 4 for more details) 



3. Back-interpolation 

The mesh based electrostatic field is finally interpolated back to the particle positions 
7*j (and possibly also to any other point in the simulation box) using the same assignment 



function W(r) and the minimum image convention (in. i.e.): 

E W(r t ) = ^£\r m )W(r m -r t 



(26) 



m.i.c. 



Once the electric field is known, the torques are obtained by eq. ^ and the electrostatic 
energy of dipole i, is given by 

U^ k) = -IU'E^ k Hr i ). 



(27) 



Note that if only the total electrostatic energy is needed, it can be obtained via eq. (23) 
which does not need any inverse Fourier transform nor back-interpolation. 



The force acting onto a particle i can be obtained by analogy with eq. (|20|) as 
= Yl W{r m -r, 



m.i. c. 



FFT 



fc^O 



+ 



FFT 



E { M] y D{k) 



+ Vi, z FFTfc^o 



(k) p(fc) 

M,x ) M,y ) "^M.z 



where the reciprocal mesh electrostatic field is -E 



(fe) 



(28) 

In the last 



formula the differential operator and the electrostatic field can be permuted as in eq. (20) . 



The differentiation used in step 2 and in eq. (28) (the so-called ik- differentiation or force 



interpolation scheme which consists in multiplying the reciprocal mesh by D(k) = ik) is 
the most accurate variant when combined with the assignment scheme employed in section 



II B 1| Note, however, that to compute the forces and electric field vectors, it requires the 
back-FFT of vectorial quantities. By contrast, in the analytical differentiation scheme as 
used in the SPME algorithm, the forces and electrical field vectors are derived in real space 
from the back-transformed potential mesh with the subsequent saving of FFT's. Analytical 
differentiation leads however to forces that violate Newton's third law and hence that do not 
conserve momentum. A global correction can be applied to restore conservation of the total 
momentum, but its effects on the physics of the system is difficult to assess. An algorithm 
that uses analytical differentiation without introducing such spurious forces is currently 
under study. 



4- The lattice Green function 

The optimal lattice Green function to compute dipolar interactions can be found by 
minimizing the rms error in the (reciprocal) pair interaction between two unit dipoles 
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in the simulation box: 



QL[T (k) ] 



/>WA3 d H dr2/d0l/d " 2 



[T^(r u fa, r 2 , fa) - T (ex ' fc) (n, fa, r 2 , fa)} 2 (29) 



where T^ ex ' k \ri, fj,i, r 2 , fa) is the exact (reciprocal) dipolar Ewald interaction (energy, elec- 
trostatic field, force or torque) between two dipoles, and T^(r±, r 2 , n 2 ) is the P 3 M pair 



interaction. The quantity Qf nt defined in (29) is the squared error of the P M interaction 



averaged over all positions and orientations of the two dipoles in the simulation box. Notice 
that the average over ri has been restricted to a single mesh cell h 3 thanks to the periodicity 
of the system. 



The optimal influence function which result from the minimization of eq. (29) is found 
to be (see App. [A} 



D(k) -ik r 



U(k m )) 4>(k 



G Q pt{k) 



D(k) 



2,9 



(30) 



where k m = k + (2ir/h)m, U(k) = W(k)/h 3 , and W(k) is the Fourier transform of the 



assignment function defined in eq. (22) 



W(k) 



( sxui^kxh) sia(-^kyh) sin(^k z h) 



(31) 



V {\k X h){\kyh){\k Z h) 

The influence function for dipolar forces is obtained by setting S = 3 in the previous 
expression. The value S = 2 refers to the optimal influence function for the dipolar torques, 
energy, and the electrostatic field. 

The form of these influence functions resembles the influence function obtained by Hock- 
ney and Eastwood for Coulomb forces (S = 1). It should be remarked that the use of the 
different influence functions to compute the forces and torques does not imply any noticeable 
time overhead because influence functions are computed and stored at the beginning of the 
simulation, and they remain unaltered throughout the whole simulation. 

When implementing the method, it is important that the reciprocal mesh is symmetric 
to avoid systematic biases on the computed quantities (see App. [B[pH 
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C. Madelung-Self interactions and correction term for the energy 



Fast-Fourier-transforms greatly accelerate the calculation of the Ewald reciprocal interac- 
tions, but have the drawback of introducing discretization errors in the computed quantities. 
On the one hand, these errors arise from truncation of the Fourier series, as wave vectors 
greater than 2ir/h are discarded in the mesh calculation, and on the other hand from alias- 
ing, which is caused by band-folding in Fourier space due to undersampling of the continuous 
dipole distribution^. The discretization errors do not necessarily average to zero, so P 3 M 
quantities may be biased. This is the case for the reciprocal energies computed on the mesh, 



which need hence to be corrected by applying a shift which is determined below [eq. (37) 
No similar correction needs to be applied to P 3 M forces and torques. 

1. Madelung-Self interaction 

The bias in the P 3 M energies originates from the fact that the Madelung and self in- 
teractions are not fully accounted for in the mesh calculation. For Coulomb interactions, 
the issue has been discussed in detail by Hiinenberger^ and Ballenegger et alP^. The exact 
Madelung interaction (energy, force or torque) is defined as the interaction of a dipole with 
all its images in the periodic replicas of the simulation box: 

^lilun>) = \ E < nL >»>») ( 32 ) 

ngZ 3 

where the sum over images must be performed in concentric shells and the vacuum boundary 



condition (e' = 1) is employed in (32). The Madelung energy depends only on the dipole 
moment /j, and the length L of the cubic simulation box. Due to the specific form of the 
dipolar interaction (pj), the sum in (32) vanishes, as proved by de Leeuw et al™. Conse- 



quently, the exact Madelung dipolar energy, force and torque are zero. Notice that the use 
of the Ewald summation ^ to compute the Madelung energy (32) leads to the relation 



TT (ex,r) , x rr (cx,fc) , x _ 2a 3 fi 2 2lTfi 2 _ 

^Madelung I A* ) + U Madelung I A* ) 3^/— + 3^3 ~~ U ' \ 66 ) 

However, if this energy is computed with the P 3 M algorithm, for example by putting a single 
dipolar particle in the simulation box, the obtained energy U(r, fi) differs from zero because 
the dipolar interactions with the images of the dipole are only approximately accounted 
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for. Furthermore, the (reciprocal) interaction of the dipole with itself, which is included 
in the mesh calculation of U^ k \r, fx), is also only approximately accounted for because 
of the discretization errors. The later subtraction of the exact self-energy by the term 
jj(seH) _ —2a 3 fj 2 / (3\/7r) will therefore not exactly compensate the unwanted self-interaction. 
These two effects are responsible for a systematic bias in the P 3 M energies because the 
discrepancy between the exact and P 3 M values does not vanish on average. We call the sum 
of the Madelung and self-interaction the "Madelung-Self" (MS) interaction. More precisely, 
it is defined as the sum of the direct and reciprocal space contribution to the energy (or 
force or torque) in a one particle system, namely 

U ms (r, fi) = f/^ delung (r, fi) + C/£ delung (r, fi) (34) 

(with this definition, U ms is independent of the choice of the boundary condition e'). Contrary 



to the exact MS energy, which reads, from (33) 

u ms lP7 — u Madelung I A*J ' u Madelung I A* ) 



(35) 



3v^F 3L 3 



the MS energy in P 3 M (|34|) depends in general both on the position and on the orientation 



of the dipole moment because of the mesh calculation. 

2. Correction term for the P 3 M energy 

The error in the P 3 M energy of a dipolar particle located at r with dipole moment \i in 
direction (x is 

AU(r, ti) = ^ (U ms (r, £) - Uffl(p,)), (36) 

where we factored out the magnitude fi 2 . This error does not vanish when averaged over all 
positions and orientations of the dipolar particle. The sum of these average errors for all 
dipoles {/*i}i=i,...,Af provides the correction term 

<f/ (corr) ) = —M 2 (AU{r, p,)) (37) 

that must be added to the P 3 M energies to remove the bias (at least on average). In eq. (37), 



N 



M 2 = J2»l (38) 



=i 
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and the average error (AU(r, ft)) is easily determined analytically. Indeed, we have 

(AU(r,ft)) = (U<g{r,ji)) ~^ + ^ (39) 



where we used (36), (35) and the fact that there is no real-space contribution to the MS 



energy in the P 3 M calculation when the minimum image convention (m.i.c.) is used. The 



average reciprocal-space MS energy is calculated in App. B 2 and reads 



fe#o 

with k m = k + (2,Tv/h) m. 

In conclusion, the corrected formula for the P 3 M energy is 

U P3M = U (r) + U$ + U (scli) + f/ (surf) + (U {coiT) ) . (41) 

Note that the correction term only needs to be computed once at the beginning of the 
simulation, hence it is inexpensive in CPU cost, but its usage can improve the accuracy of 
the dipolar P 3 M energies by several orders of magnitude (e.g. inset of Figure pi) depending 
on the values of the mesh size Nm and the Ewald splitting parameter a. 

3. Madelung-Self forces and torques 

Since each dipole in P 3 M is subject to a position- and orientation- dependent MS energy 
U ms (r, fjt), it can be expected from relations ([3|-(|4]) that it will also experience an MS force 



and an MS torque. The P 3 M force is obtained from the mesh using eq. (28) (instead of 
eq. (l3l), and it is proved in App. B2 that the MS force cancels out. Consequently, P 3 M 



conserves the momentum in difference to SPME, for example. In the same appendix, it is 
also shown that a non-vanishing MS torque does arise in the mesh calculation. However, on 
average this MS torque vanishes and does therefore not result in a systematic bias to the 
torques. 

The results on MS interactions are summarized in Table II. The fluctuating errors in MS 
interactions have an impact on the accuracy of the computed quantities. The rms error 
estimates for P 3 M energies and torques are therefore more difficult to obtain than the one 
for forces (see next section). 
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We stress that MS interactions are common to all particle mesh methods, and the explicit 



expression for the possible biases (such as the energy correction (37)) depends on the details 
of each algorithm. This is the first work, together with 35 , in which the effect of the MS 
interactions is thoroughly assessed in a particle-mesh method. 

III. ERROR ESTIMATES FOR THE DIPOLAR P 3 M ALGORITHM 

In this section, theoretical error estimates for the root-mean-square (rms) error of the 
energy, forces and torques for the P 3 M algorithm are presented. The accuracy of the P 3 M 
method depends on the chosen values for the parameters of the method: the Ewald splitting 
parameter a, the real-space cut-off distance r cut , the mesh size N M and the assignment order 
P, as well as on parameters of the system: the number of particles N, the box length L and 
the sum over all squared dipole moments, M 2 . 

It is very useful to have formulas that are able to predict the error associated to a set of 
parameter values. Not only do such formulas enable the user to control the accuracy of the 
calculation, but they also allow for an automatic tuning of the algorithm, so that it can run 
at its optimal operation point, thus saving computer time. 

A measure of the accuracy is given by the rms error defined by 



AT 



\ 



1 N \ 

-^(T«-TM(,)) 2 ) (42) 



i=i 



where T(i) is the value of T (for example electrostatic field, force, torque or energy) asso- 
ciated to particle i as obtained from the P 3 M method, and T^ ex \i) is the exact value as 
defined by the direct summation formulas (eqs. Q, (|3]), Q). The angular brackets denote 



an average over particle configurations. In (42), i is a short-hand notation for (rj,^j). In 



the case where the total electrostatic energy U is measured, the rms error is defined by 

AU = ^j({U -U^) 2 }, (43) 
where U is the corrected P 3 M energy (41), and f/( ex ) is the exact energy Q. 



Eqs. (42) and (43) are calculated analytically in the App. B to get useful error estimates 



as functions of the various parameters. The calculation is done under the assumption that 



the positions and orientations of the dipoles are distributed randomly. In Set. IV it is shown 
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that our rms error estimates still accurately predict the errors for dipolar systems in which 
the dipoles are strongly correlated. For random systems, the average over configurations 
reduces to 

where J . . . di denotes integration over all positions and orientations of particle i. 

As shown in App. |Bj the rms error arises from two distinct contributions: errors in the 
interaction of a particle i with a particle j 7^ i (including the images of particles j in the 
periodic replicas of the simulation box), and errors in the Madelung-Self interactions of each 
particle. The first contribution is denoted by the subscript int, while the latter contribution 
is denoted by the subscript ms. In App. |Bj the following three rms error estimates for the 
dipolar P 3 M method are derived. 



A. Error in the dipolar forces 

The rms error estimate for dipolar forces is given by 

(AF) 2 ^(AFM) 2 + ^Q? nt [FW], (45) 



where AF^ is the real space error 
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AF (r) „ M 2 (ya A rl ut N)~ 112 [— C 2 C + —D 2 C - -C.D^e-^ (46) 
C c = 4a 4 r c 4 ut + 6a 2 r 2 ut + 3 (47) 
D c = 8a 6 r c 6 ut + 20a 4 r c 4 ut + 30a 2 r c 2 ut + 15 (48) 

and QfnJ-F 1 ^] is given by the general expression Qf nt [T^] in which the optimal influence 
function G opt (k) is used, namely 



fe6 M3 \ m &? {D(k)) 2S Em£Z3 (U{km))' 



(49) 



using the parameters (S — 3, a — 1) for dipolar forces. The short hand notation k m = 



fe+ ym is used. 
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B. Error in the torques 



The rms error estimate for dipolar torques is 



(Ar) 2 ^(Ar«) 2 + ^Qf nt |r 



n 2 r-T-( fc )i 



N 



Qr 



where the real-space contribution Ar^ is 



At« * M 2 (Va*rl t N)- 1/2 \\ B 2 C + -^e~^, 



1 

-( 

5 



(50) 



(51) 



with B c = 2a 2 r 2 ut + 1 and QI^t^} is given by (49) using (S = 2, a = 2). The expression 
for Qms[ r(A ^] reads 



QL[r {k) ] = ^ E E G ^ KD(k),D(k')) 



(52) 



fceM 3 fe'eM 3 



E E p fc <M fc D ^(^ 

tez 3 iez 3 mez 3 



Ztti/ 



where 



/i(a, b) 



2(a-b) 2 



1 f\a + b\ 4 + \a- b\ 



and fc f 



5 V 2 
fc + (2n/h)a , k af3 = k + {2-k/K){ol + (3). 



a 



(53) 



The expression in eq. (52) is certainly cumbersome, it involves a 15-fold sum which 



renders the expression difficult to evaluate. A very easy way to substantially reduce the 



time needed to compute eq. (52) is to skip the inner loops whenever their maximal value is 



smaller than a desired accuracy. An additional reduction in the computer time by roughly a 
factor 64 can be obtained if one takes into account that aside of the function h(D(k), D(k')), 
the remaining coefficients are symmetric with respect to the sign inversion of each one of 



the components of the vectors k and k'. In fact, it is shown in Set. IV, that in practice 
the optimal performance point can be located with sufficient accuracy when ^f 1 ' Q^,[t^] 



is completely neglected in eq. (50). 



C. Error in the total energy 



The rms error estimate for the total dipolar energy is 

(A/7) 2 * (AC/«) 2 + 2M*Q 2 nt [Ui k J] + ((A^) ms ) 2 \ - «tf<««>» : 



(54) 
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where U nc is the non corrected energy [obtained by dropping ^[/( corr )^> in 41 . The real-space 
contribution AU^ is 



Al/M - M 2 (Va 4 rJ ut )- 1/2 [^ 2 + \f\ - \ Bc C c ]^e 



(55) 



The value of Q 2 nt \Unc] is given in (49) using {S — 2, a — 1/4). The reduction of the error 



due to the use of the energy correction term (([/( corr ))) 2 can be computed straightforwardly 



from eq. (37). Finally, the contribution to the error arising from the Madelung-Self energy 



(AUnc] ms ) 2 J is quite involved and computationally intensive, and thus of little use for the 
purpose of tuning the algorithm to its optimal performance point. Nonetheless, it is shown 



in Set. [TV] that a reasonable estimate of the error in the energy is obtained by dropping out 

,2 . 



the last two terms ( (AU$ ms ) 2 ) and - «[/( corr ))) z in Q because both terms tend to cancel 



out mutually. The determination of the optimal performance point of the algorithm for the 
energy can be done in just a few seconds using this last approach. The exact expression for 

B 



(Au£] ms ) 2 } is given by p5§ in App. 



IV. NUMERICAL TESTS 



In this section, the reliability of the theoretical error estimates derived in the previous 
section is tested. These theoretical estimates will be compared to numerical errors obtained 



using eq. (42) on configurations of a test system. The exact numerical values T^ ex \i) needed 



to use eq. (42) (or eq. (43) in the case of the total energy) are obtained by a well converged 
standard dipolar-Ewald sum in which all quantities are computed with a degree of accuracy 
5 < 1CT 10 . The dipolar-Ewald sum has been thoroughly tested previously against direct sum 
calculations to ensure its accuracy. On the other hand, the numerical P 3 M forces, torques 
and total electrostatic energy have been obtained using the implementation of the dipolar 
P 3 M-method in the simulation package ESPResSo 2 -. The calculations of the error estimates 
have been done by truncating the aliasing sums over m = (m x ,my,m z ) G Z 3 at \m a \ < 2 
for P = 1, and at \m a \ < 1 for assignment orders P > 1. All the quantities in this section 
are calculated using an arbitrary length unit C and dipole moment unit M.. Therefore, for 
instance, energies and energy errors are given in units of Ai 2 /C 3 . Hereby, the theoretical 
rms error estimates will be plotted as lines, whereas numerical rms errors will be depicted 
by circles. 
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The first test system consist of N = 100 particles with dipole moment of strength fi = 1 
randomly distributed in a cubic box of length L = 10. Figures [T] and [2] show the rms error for 
forces and torques as a function of the Ewald splitting parameter a for a mesh of Nm = 32 
points per direction. The real space cutoff parameter is set to r cut = 4 in all plots unless 
specified otherwise. From the top to the bottom, the order of the assignment function is 
increased from P = 1 to 7. Figure [l] shows, that the theoretical rms error estimate (eq. (45)) 
gives a good description of the numerical rms error in the whole range of values of the 
Ewald splitting parameter a. In the inset of figure [TJ a similar comparison is presented 
for different mesh sizes. From top to bottom the number of mesh points per direction is 
Nm € {4, 8, 16, 32, 64}, and the assignment function is P = 3. A remarkable agreement 
between the theoretical error estimate and the numerical measured error is observed. 



Figure |2| shows that for torques, the rms estimates, eq. (50), give also a good description 
of the numerical rms error for torques in the whole range of a's. The inset in figure [2] shows 



that if the MS contribution is not included in the error estimate for the torques, eq. (52), 
then large mismatches are observed at large a's. Nonetheless, it should be noted, that the 
optimal performance point can be roughly located even when the fluctuating errors in the 
MS torques are neglected. This behavior was confirmed for all cases studied in this work. 



Thus, skipping the time consuming evaluation of the MS contribution (eq. (52)) is a fast 
and reasonably accurate way to determine the optimal performance point for the torques. 
For the forces and torques, even the numerically computed estimate of the rms error of 



a single configuration is an average over the different dipoles (see eq. (42)). However, for 



the rms error of the total energy (43), it is is a single value. To obtain useful statistics, it is 
therefore necessary to average over a set of configurations. 

Figure [3] shows a comparison of the rms error for the energy as a function of the Ewald 
splitting parameter for a mesh of Nm = 32 points per direction. The agreement between the 
theoretical and the numerical rms errors is remarkable. The inset plot in figure [3] shows that 



substantial errors arise when the energy correction term (eq. (37)) is not taken into account 
(dashed lines). The improvement brought by the correction term decreases when the mesh 
size Nm is increased (at fixed number of particles N). Similarly to the case of torques, a 
fast, though approximate, error estimate for the energy can be obtained by dropping out 
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the MS and the correction term contributions in equation (54), i.e. 



(AU) 2 « (Af/M) 2 + 2M'Q 2 mt [Ui% (56) 

This approach predicts quite reasonable errors (compare solid and dashed lines in figure [4]) 
and has the big advantage of being several orders of magnitude faster than the full exact 



error given in (54). It works reasonably well because it turns out that the MS error term 



^(AC/"nc,ms) 2 y for the energy is quite close to the correction error term (([/( corr ))) 2 and 



therefore they almost cancel out completely in (54). Therefore, it is suggested to use (p6 



in place of (54) to roughly localize the optimal performance point of the algorithm for the 
energies. 

In addition, figure [4] shows that the theoretical estimates capture the correct dependence 
of the rms error on the number of particles N and their dipole moments \ fx\. Various number 
of particles and dipole moments were considered: (N = 1000, \n\ = 1), (N = 2000, |/u| = 5), 
and (N = 4000, \n\ = 25). 

The behaviour of the error estimates for the forces, torques, and energy in the previous 
figures shows that the optimal performance point of torques and energy occur roughly at 
the same value of the Ewald splitting parameter a. Notice that when the parameters of the 
algorithm are fixed, the highest accuracy is usually obtained for torques, followed by the 
forces and the least accurate calculation corresponds to the energy. The optimal performance 
point for forces is usually shifted slightly to higher values of the Ewald splitting parameter 
a with respect to the optimal performance point for torques and the energy. The shift 
increases with the number of mesh points Nm and the assignment order P. Far from the 
optimal point, the behaviour of the three error estimates is, as expected, quite different. 
The fact that the optimal point of energies is quite similar to the optimal point for torques, 
which in turn is also not very far from the optimal performance point for forces can be 
used to do a very fast tuning of the algorithm for the three quantities: first, the optimal 
performance point for forces is located using the RMS theoretical estimate for forces (which 
is an immediate calculus). In a second step, this optimal point is used as a starting point to 
seek the optimal performance point for torques. In the third stage, the optimal rms error 
associated to the energy can then be straightforwardly evaluated using the error formulas 



for the energy (56) looking in the neighbourhood of the the optimal performance point a 



obtained for torques. 
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The strongest simplification done to derive the theoretical estimates is the assumption 
that dipole particles are uncorrelated. Nonetheless, tests were performed that have shown 
that the theoretical error estimates are very robust against particle correlations. In figure [5] 
the performance of the theoretical estimates is tested for systems in which strong correla- 
tions exists among the particles. A comparison of the theoretical rms estimates for random 
conformations to the numerical rms errors obtained for forces and torques in a typical fer- 
rofluid simulation^ of 1000 particles with a diameter o w 1.58 is performed. The dipolar 
interaction between particles is characterized by a dipolar coupling parameter A = 3, and a 
volume fraction <ft v = 0.3 [which roughly corresponds to box size L = 19, and M 2 ~ 11858]. 
To add an extra degree of correlation among particles, the system is under the influence of 
an external magnetic field along z axis characterized by a Langevin parameter oll = 2, i.e. 
the characteristic energy induced by the magnetic field is twice the thermal energy. This 
system exhibits dipolar chaining, and hence a high degree of anisotropy. Figure [5] shows that, 
even for this highly correlated system, the measured errors (P 3 M method with N M = 32 
and P = 7) are close to the theoretical estimates for randomly positioned particles. The 
agreement is particularly remarkable near the optimal value of a. Other tests have shown 
similar behaviour. Therefore, the theoretical estimates provide a very good guidance for the 
location of the optimal performance point of the algorithm in the case of correlated systems 
as well. When the theoretical rms error estimates derived for uncorrelated systems are used 
to predict errors in non random systems, it has been observed that the error estimates for 
dipoles perform better than the error estimates for charges. This difference could be due to 
the fact that dipolar particles have rotational degrees of freedom which can further reduce 
the effective degree of correlation respect to a similar system made of charges. 



Finally, tests have shown that the optimal influence functions as defined in eq. (30) 
(S = 3 for forces, S = 2 for dipolar torques and energy) can be used interchangeably with 
very little impact of the accuracy of the results, especially in proximity to the optimal value 



of a. This is due to the exponential decay of the reciprocal interaction <f>(k) (see eq. (16)) 



which renders all terms rn^O negligible in the numerator of eq. (30). Hence, in the tested 



cases, the dipolar influence functions are given in good approximation by 

G(k) = m~ ^ ~ 2 , (57) 

(E^3t/ 2 (fc + mf)) 

which is actually the optimal lattice Green function for computing the Coulomb energ^ ' 
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The latter function has a broad applicability because it incorporates the main effect of 
the P 3 M optimization, which is to reduce the (continuous) reciprocal interaction by some 
fraction, to compensate for aliasing effects that are inherent to the mesh calculation. 



V. COMPUTATIONAL PERFORMANCE 
A. Comparison against dipolar-Ewald sums 



Due to the replacement of the Fourier transforms by FFT routines, see eq. (24) and 
(28), the P 3 M algorithm is not only fast but its CPU time shows a favourable scaling with 



particle number. If the real space cutoff r cut is chosen small enough, (so that the real space 
contribution can be calculated in order N), the complete algorithm is essentially of order 
iVlog(iV) as shown in figure [6] In this figure, a comparison of the presented dipolar-P 3 M 
and dipolar-Ewald sum methods at fixed level of accuracy for the dipolar force AF = 10~ 4 is 
shown. Parameters in both methods have been chosen to minimize computational time given 
the imposed accuracy, with the only constraint that the algorithm must satisfy the minimum 
image convention (r cut < L/2). Figure [6] and additional tests performed at AF = 10~ 6 point 
out that the dipolar-P 3 M algorithm is faster than the dipolar Ewald sum for iV > 300. The 
inset in figure [6] shows the relative speed of the P 3 M to the Ewald method as a function of 
the number of particles in the system. 

B. Constant Pressure dipolar-P 3 M simulations 

The P 3 M method relies on the use of the influence function G(k) which depends on the box 
parameters, L in our cubic geometry. This means that in ensembles where the volume is not 
a fixed quantity the recalculation of the influence functions is needed whenever L is changed. 



The repetitive update of G(k) via eq. 30 or eq. 57 can be computationally expensive. In 
the case of Coulomb systems, the use of P 3 M algorithms for constant pressure simulations 
has been studied by Hiinenberger^l for both isotropic and anisotropic coordinate scalings. 
The closest approach in our case to the method proposed irP^l for the isotropic scaling from 
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a system with size Lm to a system with size Li 2 ) would consist on using the transformations 



a (l)L(l) = Q!(2)L(2), 
"(1)^,(1) = Ot(2)Tcut,(2)- 



(58) 
(59) 



Indeed, due to the equality given in eq. 58 the following simple relation between optimal 
influence functions is obeyed 



G(i) 



L{2) 



G 



(2) 



(60) 



if the mesh-size Nm and influence order P are unaltered. Under such conditions, it is simple 

ensures that if (am,Lm) minimize Qi nt [T^ k ') also 



to show from eq. 
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that the condition 



58 



does (a (2), -^(2)), where the relation between the value of both minimums is 



QUT (k) h 



(1) 



L 



(2) 



-(2S+2) 



QUT {k %. 



(61) 



It can be analogously shown that the equality given in eq. 59 leads to a similar scaling for 



the real space errors. Thus, recalling the expressions for the rms error estimates (eqs. 45 



50, and 54), the relation between the total errors of both systems is 

AT (1 ) _ ( L (2) ' 

A7}< 



(62) 



'(2) \ L (1). 

where 6 = 4 for the forces, and 6 = 3 for torques and energies. 

Therefore this approach keeps the level of accuracy set initially when we increase the size 
of the system, Lm < L( 2 ) • There is however one caveat: if the size increases too much, it can 
happen that the set of parameters obtained from the previous scaling rules [ same Nm, same 



P, a and r cut deduced from eqs. 58 and 59 ] may not correspond anymore to the optimal 



point of operation of the algorithm. A practical method for dealing with constant (isotropic) 
pressure simulations is then the following: via the analytical error estimates determine the 
optimal values of the parameters for the smallest box-size one expects to have to simulate 
Lmin , use eqs. 58 and 59 to obtain the a and r cut for the current size L of the system, as 



well as eq. 60 to transform from the influence function calculated for L m i n to the one needed 
for L. If L < L mn , 



recompute the influence function via eq. [30] or eq. [57} If L ^> L min , use 
the error estimates to check if the current algorithm parameters ( iV^ ,P and r cut ) are still 
the most optimal ones for speed purposes and the selected level of accuracy. 
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Unfortunately, in the case of anisotropic coordinate scalings an approach for dipoles 
similar to the one suggested by HiinenbergerSS can be as costly as evaluating again the 
whole influence function. No fast alternative to the recalculation of the whole influence 
function seems to exist for this case. 

C. Dipoles versus charge-based system representations 

The most simple approach for producing dipoles would be to use a pair of opposite 
charges, separated by some small distance. This would be simple, and one could use all 
the existing methods for simulating pure Coulomb systems. It is therefore desirable to pro- 
vide guidance about the practical usefulness for Molecular Dynamics simulations of models 
and algorithms based in true point dipole representations, as for instance the dipolar-P 3 M 
presented in this work. 

In this section we compare two different models that are intended to represent the same 
physical system (a ferrofluid): a set of N particles embedded into a cubic box of volume 
V that interact via dipole-dipole interaction (periodic boundary conditions used) plus a 
repulsive soft-core repulsion (Weeks-Chandler- Andersen potentiaP^) which it is of the other 
of ksT when the distance between centers is equal to one diameter a. 

The model relying on true point dipoles^ 3 -^ uses a Langevin thermostat for both trans- 
lational and rotational degrees of freedom of the particles, and the dipolar-P 3 M (ik- 
differentiation) algorithm is used to account for the long-range interactions. The dipole 
moments have been set to fi = 1, and fc#T = 1. 

For the charge-based model, we have taken the most simplistic approach for MD simu- 
lations: the dipole is mimicked via two point charges +q and — q which are separated by 
a distance d such that p = \q\d = fi (Gaussian units). The movement of the two charges 
inside the particle is constrained by a FENE potential between the charges and the center 
of the particle to force the charges to move with the particle, plus a WCA and an angular 
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potential acting between both charges in order to stabilize the dipole: 

V FENE (r qc ) = -KfJ*™* In ( 1 - (63) 

V{9) = ^(9-e o ) 2 , (64) 

f 4e ( ( r -f) 12 - ( r -ff + l) , for ran < 2^d 
V WCA (r qq ) = { y [d) {d ) V qq , (65) 

[ 0, for r qq > 2 1 / 6 d 

where r qc is the distance of a charge to the center of the particle, r qq is the distance between 
both charges, and 9 the angle (in radians) formed by the the two charges and the center of 
the particle. The chosen parameters for the three potentials are r max = 0.8c?, kf = 2000 k E T, 
K a = 1000 k E T, O = 7T, e = 1000 k E T . The same Langevin thermostat for the dipole- 
based model is used for the charge-model, but without rotational degrees of freedom. In 
this case, the long-range interactions are computed using the Coulomb-P 3 M method (ik- 
differentiation) 13 ' 24 ' 40 '. 

Both models have been simulated via the simulation package ESPResSo 2 ^ 3 , which uses 
a velocity Verlet integrator. The parameters of the Coulomb and dipolar P 3 M algorithms 
have been tuned in each case to the optimal values to yield maximum speed for a force 
accuracy AF = 10~ 4 . Figure [7] shows the relative speed of the dipole-based method respect 
the charge-based model as a function of the number N of particles in the system. The 
relative speed has been computed by measuring the times and t q that the dipole and 
the charge models, respectively, need to integrate 20000 time steps. For the charge-model 
two different separations between charges d have been sampled because the optimal value of 
the Coulomb-P 3 M parameters (NM,P,r cu t,a) are observed to depend on d. In general, the 
smaller d, the lengthier the calculation of the long-range forces in the charge-based model. 
The case d = a/2 has been chosen because it represents the limiting case for mimicking 
dipoles. For d > cr/2 the distance between two charges belonging to a same particle can 
be larger than the distance between charges belonging to different particles, and thus the 
charge-model should be expected to be a poor approach to the dipolar interaction . The 
case d = er/10 represents a more likely value of d. The comparison in figure [7] shows that 
the dipole-based model shows in general a better performance than the charge-based model 
for both d = a/2 and d = a / 10. The relative performance of the dipole-model is observed 
to increase with the reduction of the distance between charges d. The advantage of the 
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dipole-based model respect to the charge-based model under the constrain that both models 
should deliver the same force accuracy AF = 10~ 4 must be related to the fact that the 
time needed to compute several extra FFT's required by the dipole-based model plus the 
handling of the dipole rotations is in general smaller than the extra time needed by the 
charge-model to deal with 2N electrostatic centers as well as the constrained movement of 
the charges inside the particle. 

Finally, it should be remarked that the time step dt needed to run adequately the MD 
simulations for the charge-based model has been found to be around two orders of magnitude 
smaller than for the dipole-based model when d = a/10, while similar time steps are possible 
for d = a/2. In principle this implies that for realistic charge-based models mimicking 
dipoles, d <C a/2, extra steps are needed to span the same physical time. Nonetheless, this 
difference in the values of the time steps could be due to the type of charge-model used 
in the current comparison. A test of the performance of the dipolar-P 3 M algorithm with 
all possible charge-based models is not possible, but the present comparison illustrates that 
dipole-based models are reliable tools for simulating dipolar systems. 

VI. CONCLUSIONS 

In this work, an extension of the P 3 M method of Hockney-Eastwood to the case of dipolar 
interactions is presented, using the ik differentiation scheme. This variant is expected to be 
the most accurate particle-mesh based algorithm. Optimal influence functions that minimize 
the errors for dipolar forces, torques and energy have been derived. We have shown that 
Madelung and self interaction terms will arise in any particle mesh method. We have 
derived estimates of these MS terms for the energy, force, and torques, and proved that, 
for the ifc-differentiation scheme, the force MS term is zero while the other terms are not. 
These MS interactions are responsible for a bias in the p3m energy, which we suppressed 
by shifting the energies appropriately. Using these results we derived accurate rms error 
estimates for the energy, forces, and torques. The validity of these estimates is demonstrated 
numerically by computing the errors for test systems with our P 3 M implementation, using 
various parameter sets, and comparing them to our analytical estimates. We have further 
demonstrated that using our simplified error formulas, the optimal a for any parameter 
combinations (Nm, r cut , P) can be accurately found. Consequently, these formulas enable to 
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determine the parameter combination that yields the optimal performance for any specified 
accuracy. This can be conveniently done prior to running an actual simulation. 

Although the derivation of the rms error assumed uncorrelated positions and orientations 
of the dipoles, we numerically showed that our estimates are sufficiently accurate also for 
highly correlated systems. 

The timing comparison between our dipolar-P 3 M algorithm and the standard dipolar 
Ewald sum shows that the performance of the P 3 M is superior to the standard Ewald 
method in systems consisting of more than 300 dipoles, and we see the expected (almost) 
linear scaling for large particle numbers. A protocol to speed up dipolar-P 3 M calculations 
for constant pressure simulations is presented in Sec VB| In addition, the test comparing 
a dipole-based model with a charge-based model to mimic simple ferrofluid systems shows 
that the use of dipole-based models can be advantageous. 

The somewhat tedious calculations necessary to derive our results have been collected in 
the appendices for the interested reader. 
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APPENDIX A: BUILDING UP THE P 3 M DIPOLAR ALGORITHM 



1. The optimal influence function 



In this appendix the analytical expressions for the optimal influence functions G are 



derived (see eq. (30)), and the measure Q int of the error for forces, torques, and the energy 



is provided (see eq. (29)). The derivation is done in close analogy to the derivation for the 
Coulomb case by Hockney-EastwoocP^. 
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The Parseval theorem for Fourier series 
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l/(r)rdr 
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(Al) 



fceK 3 



allows to rewrite the measure of the error Q 2 [T ( k >], eq. (29), for a system containing two 

dipolar unit particles (7*1, Ai) and (r 2 , A2) as 
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|t( fc '-)(fc, Ai, A 2 )l 2 - 2T^(r 1 , ft, Ai, A 2 ) • [t^)(fc, Ai, A 2 ) 

where we recall that function r^ ,ex )(r*i, r 2 , Ai) A2) = T^ k ' ex \r2 — ri, Ai, A2) is the (recip- 
rocal) dipolar Ewald interaction between two unit dipoles (this interaction corresponds to 
the dipolar interaction of dipole 2 with dipole 1 and with all the periodic images of dipole 
1), and that T^ k '(ri,r 2 , Ai, A2) is the corresponding interaction as computed with the P 3 M 
algorithm. Eq. (A. 2) involves the Fourier transforms of these functions over r 2 , at fixed 
position 7*1. The Fourier transform of the p3m interaction T^ k \rx, k, Ai, A2) depends on the 
position of dipole 1 within a mesh cell, while the Fourier transform of the exact interaction 
is independent of T\ because of translational invariance. 

The functions are linked to the mesh based functions = FFT[T^] by the simple 
relation 



V(*0 U(k), (A3) 
which is proved below in Set. A. 2. 

~(k) A k A k A k 

In turn, T^/ can be calculated from eqs. (23), (24), (28) and the fact that the Fast 



Fourier Transform of the mesh-density eq. (21) for a single particle system (ri,/xi) is (see 
refP) 



p M (k) = FFT[p M (r m )] = p **i e_ifc " 



where fc„ = k + tt-ti.. Thus, for the present P 3 M algorithm the functions are 



E(ri,k,ii x ) 
F(r x ,k, pi, 1*2) 



D(k) 4>p3m( r l, ft, Ml) ) 

-D(k)xfj, 2 ) </W(fi,fe,Mi)) 
-£>(fc)-/A 2 ) <P P 3m{ri,k,Hi), 



(A4) 



(A5) 
(A6) 
(A7) 
(A8) 
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where 



>!,*>, = U(k) G(k) Mi) e 



(A9) 



and C/(fc) = ^(fc)/^ 3 , k m = k - ^m, with £>(fc) defined in Q. The quantity 
4>pZm{ r Xi feiMi) is t ne Fourier transform (over r 2 ) of the electrostatic potential created at r 2 
by a dipole /Zi at r*i according to the P 3 M algorithm. Because of the presence of the mesh, 
that potential is not translationally invariant and depends on the position of t*i relative to 
the mesh. 

Once the functions T are known, the next step involves the calculus of the exact functions 
jp(ex) f Qr same system. It is straightforward to show that in the case of a system 
containing two particles the exact functions are 



.(ex) 



U ( d £X \k, /ii,/i 2 ) 



(k,fj,x,fj, 2 ) 



(ik ■ fjt 2 ) " Mi) ik 4>(k), 
(fjt 2 x ik) (ik ■ fjti) 4>(k), 
-(ik ■ nx) (ik ■ fj, 2 ) <f>{k), 



(MO) 
(All) 
(A12) 



where <p(k) is defined in (16). In exact calculations, as one would expect, only the relative 



distance between both particles (k coordinate in the reciprocal space) is relevant. 



Once the values of T, and T 7 ( ea: ) are known, it is possible to simplify the expression (A2) 



and arrive at the following expression for the rms error of the reciprocal-space components 



QL[T {k) ] 



9V 2 ^ 

feeM 3 

fc^O 



G 2 (k) \D(k\ 



\2S 



2_j l^rn 

m6Z 3 



25' 



-2G(k) £ (ik m -D(k)Y U 2 (k m ) 4>(K 



(A13) 



The set of parameters (S = 3, a = 1) leads to the measure of the error in forces, (S = 2, a = 
2) corresponds to the case of torques, and (S — 2, a — 1/4) must be used for the dipolar 
energy. In the case of the dipolar electrostatic field E, the values of the parameters are 
(S = 2,a = 3). 

The optimal influence functions for the different dipolar quantities (force, torque, and 
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energy) can be now obtained by minimizing eq. (A13) with respect to G, 
= 



6QL[T] 



5G 



(A14) 



G. 



opt 



The optimal influence function expressions obtained are summarized in eq. (30). Notice 
that the influence function optimized for torques is the same than for the energy, which is 
a consequence that for both cases it is necessary to optimize the dipolar electrostatic field 
since that the dipolar energy for a particle is Ud = —i-i ■ E, and its torque is r = x E. 

It should be noted that the influence functions are calculated to minimize only errors 
in p3m pair interactions, neglecting errors in MS interactions. In the case of forces, no 
further improvement can be expected because the MS forces are zero, but for torques and 
energies further optimisation is in principle possible. The benefit of such a full optimization 
is however expected to be small in typical systems because of the different scaling (with 
respect to the number of particles and dipoles moments) exhibited by these two sources of 
errors (see Set. B.3). 



2. Technical proof of eq. (A3) 



The Fourier series of a function T^ k \k) can be written using the mapping-back relation 



(see eqs. (26) and (28)) as 
fW(Jfe) 



I dr T^{rm)W{r-r m )e-^ r 

m.i.c. 



^ T«(r m ) W(k) e 



(A15) 



where the second equality follows from a change of variable (shift theorem) and the fact the 
W(r) decays to zero on a distance shorter than half the box length. If we replace T^\r m ) 
by the equivalent expression FFT' 1 ^^], we obtain 

W{k) 



f {k \k) 



V 



E E ^ (* 



(A16) 



fe'eM 3 r„ 

In order to do a further simplification, it is necessary to rewrite the sum over the mesh 
points r m as a continuous integral with the help of the sampling function ]J(r) defined as 



II(r)= E 5 (T-r m ) = ^ y E 



(A17) 



r m eK :i 



mat 
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Thus eq. A16 can be rewritten as 

f^ k \k) = U(k) E f M( fc ')f/ / dr e- l ^ m - k '> r 



(A18) 



fc'eM 3 mez 3 



where we used = W(k)/h 3 . The integral in (A18) divided by the volume is equal to a 



Kronecker delta 5 



k+'^m,k 



., which allows us to obtain the result 



f = u(k) E T M( fc/ ) 



(A19) 



which leads to eq. A3 



APPENDIX B: DERIVATION OF THE RMS ERROR ESTIMATES 



1. Errors in pair- interact ions and Madelung-Self interactions 



An important point in the calculation of the rms errors is to recognize that the error 



AT(i) =TH)-T {ex) ( 



(Bl) 



on quantity T(i) (energy, force or torque of a single particle i) can be understood to arise 
from two distinct contributions: the interaction of a particle i with all other particles j ^ i 
(including the images of particles j in the periodic replicas of the simulation box), hereby 



denoted by the subscript int, and the Madelung-Self interaction (see II C). Thus 



T{i) =T int (^)+T ms (2), 
T (cx) (f 



(B2) 
(B3) 



and therefore the error is 



AT(i) = AT w (i) + AT ms (z) 

= ^AT int (z,j) + AT rr 



(B4) 
(B5) 



In (B5), AT int (z, j) is the error in the pair interaction of particle i with particle j (including 



the interactions of % with the images of particle j ^ i). AT ms (i) is the error in the MS energy, 



force or torque of particle i. Explicit expressions for T mg (i) can be found in section B2 The 
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strength of a dipolar interaction is proportional to the product of the dipole moments of the 
two particles. Setting 

AT int (z,j) =^i£int(i,j) (B6) 
AT ms (i) = tf£ ms (i), (B7) 



(B5) can be rewritten as 



AT{t) =HiJ2 Mmt(^j) + Mi&nsO')- (B8) 

By definition, ^mt(^j) and £ ms (i) give the direction and magnitude of the error for two 
unit dipoles [i stands for (t^, /*»)], for pair- and MS interactions respectively. The decom- 



position (B8) of the error into an interaction and MS contribution is a central point in the 



calculation of the rms errors, because both contributions are uncorrelated and lead to a 



different scaling with respect to the dipole moments (see further Set. B3). 



2. Mean MS values of the quantities 

In this section we prove several expressions related to the mean values of the Madelung- 



Self forces, torques and energies used in section II C 



a. Derivation of -Fms (f,/-*) = 



The reciprocal contribution of the MS force of a particle is, 

^A)4E ^ F(r,k,vL 1 = »,» 2 = n) (B9) 



V , 

fcGK 3 
fe^O 



which using equation X6 reduces to 

= (D(k)-»y (-D(k))U(k) 

fceK 3 

fe#0 

G(k) e~ l{2w/h)m - r = (BIO) 

mez 3 

The previous sum is zero because each k term cancels out with the corresponding —k 
term (provided the lattice that is used is symmetric). Madelung-Self forces vanish therefore 
identically. 
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b. Derivation of (r^(r,fi,))=0 



The MS torque for a single particle can be written as 



— elkr T{r,k,ni = V,V2 = At) 



(Bll) 



fe^O 



where r is given by eq. (A7). Writing explicitly the average, the following expression is 
obtained 



fcSK 3 
fe#0 



D(k)-n) l-D(k)x n)U(k) 



G(k) £ C/(fc m ) e" 



ikm-r _ g 



This average torque vanishes because 



ciO,, (£>(*:) x /z) (£>(fc) • /x) = 0. 



(B12) 



(B13) 



c. Calculus of (Ums{f, A)) leading to eq. (40) 



The MS energy for a single unit dipole particle can be obtained from eq. (A8) by setting 



A*i = A*2 = A; r i = r an d evaluating the back- Fourier transform at the point r 2 = r: 



-E 



r. /i. i = — 7 ^ e l r C/ d (r, fc, /^ = p,, fi 2 = fi) 



fcez- 5 
fe^o 



e ^(D(fc). M ) (/(fc)G'(fe) 



2V 



fe^O 



t km • V 



(B14) 



nidi 



where k m = k + (2n/h) m. Applying the average defined in (44) and using the identity 
^ j dre~ ir < 2 ^ m = 6 m>0 (B15) 

where 5 is a Kronecker symbol, and the angular integral 

i- f dft M (5(fc) • = l - £>{kf ^ (B16) 
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lead to 



(B17) 



fe6K 3 
fe^O 



The functions D{k) and are periodic over the Brillouin cells, which allows to rewrite 



the mean value of the MS energy for a single dipole particle as eq. (40). 



3. Scaling of the rms errors 



In this section, the scaling of the rms error estimates for the forces, and torques with 
respect to N and {/Xj} is derived using general arguments. The results of the present section 
also apply to the error of the energy of single particles, but not directly to the error of 
the total energy because it involves all possible pair interactions and an extra correction 



term (37). The error of the total energy will be discussed apart in section B5 



First, it should be noticed that the surface terms [eq. (13) and last term in eq. (18) 
do not lead to any error, because they are computed exactly. Therefore from now metallic 
boundary conditions (e' = oo) are assumed, and surface terms are discarded . Assuming the 



system to be relatively large, eq. (42) can be approximated as 



AT -^E<( AT «) 2 >> 

by following the line of reasoning of ref P. 



(B18) 



According to (B8), the error AT(i) arises from errors in pair-interactions and error in 
MS interactions. With the energy shift (37), the P 3 M algorithm is such that the error is 



zero on average ((AT(i)) = 0), as it should. This implies 

(en»(i)) = o 

(6nt0',.7')>=0. 



(B19) 
(B20) 



The stronger statement that the average error of the pair-interaction still vanishes even if 
dipole i is kept fixed, 

1 



dSljtiatfaj) = 0, 



(B21) 
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holds because the angular integral clearly vanishes (the integrand is odd in fij). The prop- 



erty (B21) implies in particular that 



(imt{i,3) ■ imt(i, k)) = 5 j>k (£? nt (i,j)) . 



(B22) 
(B23) 



The mean-square error (AT(i) ) in (B18) becomes 



(B24) 



where the second equality follows from (B22) and the third equality from (B23). The mean 



square errors of the pair and MS interactions, 



(B25) 
(B26) 



do not depend on the chosen pair of particles by definition of the configurational 

average. The mean-square error on particle i reduces (using (M 2 — nf) ~ M 2 ) to 



4/n2 

ins 



Eventually, it is found that the rms (total) error (B18) can be expressed as 

, 2 „MH& t + J LirtQL 



AT 2 



N 



where, using (B25) 



(B27) 



(B28) 



(B29) 



is the mean-square error in the pair interaction between two unit dipoles (see eq. (B6)) and 



(B30) 



is the mean-square error in the MS interaction of a unit dipole (see eq. (B7|). Notice that 



the average over r\ in eqs. (B29) and (B30) can be restricted to a single mesh cell h 3 thanks 



the periodicity of the system. 
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The result (B28) exhibits the scaling of the rms error AT with respect to the number of 
particles and the magnitudes of the dipole moments. 

It is important to stress that our result for the scaling of AT takes into account not only 
the contributions from errors in pair-interactions, but also errors in MS interactions. When 
using standard dipolar Ewald sums, rms errors in MS interactions are negligible (at least 



if the energies are correctly shifted^ and (B28) reduces to the expression found in refP^ 



for the scaling of the error. By contrast, the errors due to MS dipolar interactions play an 
important role when using Particle-Mesh methods, because of the loss of accuracy brought 
by the discretization of the system onto a mesh. 

4. Explicit formulas for the rms errors 



To use the error estimate (B28), we need to know the mean-square errors Q( nt and Q„ 1S , 
which measure, respectively, errors in the pair-interaction Ti nt (i, j) and errors in the MS 
interaction T ms (i). These errors depend on the details of the method employed to compute 
them (here the P 3 M algorithm), but are independent of the simulated system. In this section 
explicit theoretical expressions for these errors are derived. These expressions are functions 
of the "methodological" dimensionless parameters (aL, r cut /L, Nm = L/h and P). It should 
be recalled that surface terms are discarded by setting metallic boundary conditions, because 
these terms do not play any role in the error estimates. 

The quantity T (= force, electrostatic field, torque or energy) is computed as a sum of a 
real-space contribution and a reciprocal-space contribution + T^ scU ' ) [T^ sclf ^ vanishes 



in the case of the force, see eqs. (\9\), (17) and (18)]. If the errors in these two contributions 



are assumed to be statistically independent, it can be written with (B28) and (B29) in mind, 



(at) 2 ~(atM) 2 + (at«) : 



(B31) 



where AT^ is the rms error arising from the real-space contribution, and AT^ is the 
rms error arising from the reciprocal-space contribution. These two rms errors are given 



by eqs. (B28)-(B30), in which the mean-square errors Qf nt and Q^ s are computed with the 



direct-space, respectively reciprocal-space, contribution to AT(z) only. 
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a. 



Error estimates for real-space contributions 



Introducing decomposition (B4), the real-space contribution to the rms error (B31) splits 
into two terms 

(ATM) 2 = ( AT W)3 + (AT W)2 (B32) 
where AT^ is the rms error of (real) pair interactions and AT^ is the rms error of (real) 



Madelung interactions. No cross-term appears in (B32) because of property fB22|. (AT^) 2 
is negligible due to the fast decay of the real-space contribution. Thus, (ATi^s) = 0, and the 
real-space rms errors of the P 3 M method approximately coincide with those derived for the 
dipolar Ewald sum method^, because real-space contributions are evaluated identically in 
both methods. These error estimates are given by (46), (51 ) and (55) (see also ref. ^j). Notice 
the exponential decay exp(— a 2 r^ ut ) of the error with the real-space cutoff distance r cut . 



b. Error estimates for reciprocal-space contributions 



Introducing decomposition (B4), the reciprocal contribution to the rms error (B31) splits 
into two terms 

( A rW) 2 =(AT«) 2 +(AT(f)) 2 



(B33) 



where AT^ is the rms error of (reciprocal) pair interactions and AT^J is the rms error of 



(*0 



(reciprocal) MS interactions. No cross-term appears in (B33) because of property (B22). 



By (B28), these two contributions scale like 
M 4 



N 



■QUT (k) ] 



(AT£)) 2 = ^Ql s [T {k) ] 



(B34) 
(B35) 



where Qf nt [T^] (resp. Qm S [T^ k ']) is the contribution to the mean-square error (B29) 
(resp. (B30)) associated to the reciprocal interaction T^ k \ The problem of predicting the 
rms errors of the P 3 M algorithm is now reduced to finding explicit expressions for the 
functions Qf nt [T^} and ^[T^]. The detailed calculation of these quantities is performed 
in section A 1| for the pair interactions, and section B 4b for the MS interactions. For the 



total energy see section B 5 
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(b.l) rms error in pair-interactions: AT- 



(A:) 



The lattice Green function G(k) is determined in the P 3 M method by the condition that 
it minimizes the rms error ATjjj^ of the (reciprocal) pair-interaction. The minimization 
of this rms error was performed in App. |A 1[ where is it shown that the minimal errors 



are given by eq. (49) where in the case of forces, we have to use the set of parameters 



(S = 3, a = 1), for torques (S — 2, a = 2), and (S = 2, a = 1/4) for the energy. It should be 



noticed that eq. (49) reduces to the rms error corresponding to Coulomb forces when the 



parameters are set to (S — l,a — 1) and the factor 1/9 is dropped^. When the optimal 



lattice Green function (30) is used, the (reciprocal) rms error in pair-interaction is given by 



inserting (49) into (B34) 



(b.2) rms error in MS interactions: AT, 



From (B35) and (B30), the rms error in MS interactions involve the quantity 



QL[T (k) ] = ( (rW(r,A) -r£-«)(r,A)) ! 



IIC 



(B36) 
for a unit 



where T^(r, p.) is the P 3 M Madelung-Self interaction defined in section 
dipole. The exact MS interaction Tm S ' ex \r } (x) is non-zero only in the case of the energy. 



Since the P 3 M MS force is identically zero (see section B 2 ) the rms error vanishes for this 
quantity: 



AF<9 = 0. 



(B37) 



According to (|B11|), the rms error of MS torques is given by 
) 2 \A k ) 



QL[r {k) ] = ((r£ ) (r,A)) a ) = i ^3 / dr I d ^ £ £ (*>(*) • A)' ( D ( k ') ' A 

fc^o feVo 

D(k) x ft) ■ (-D(k') x ft)} U{k)G(k) U{k')G{k') 



U{k m ) e 

J(k+k')-r 



ik' r 
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where k m = k + (2ir/h)m and fc^ = fc' + (2ir/h)n. The integral in eq. (B15), and the 
angular integral 

/ (a • jx) (b • n) [(a x /x) • (b x M )] = 2 ^h(a, b) (B38) 
where h(a, b) is given by eq. ([53]) , lead to 



QL[r (fe) ] = ^EE h{D(k),D(k')) U(k)G(k) (B39) 



fees 3 fe'eB 
fc^o fcVo 



VrneZ 3 

Finally, using the fact that D(k) and G(k) are periodic over the Brillouin cells, the mean 



square MS torque for the reciprocal contribution reduces to the expression given in (52) 



5. Rms error for the total corrected energy 

A theoretical estimate can be derived for the rms error of the total energy AUp^M m 



eq. (43). Hereby in order to avoid confusions the values related to non-corrected energies 
will be identified with a sub index (nc). As in the case of forces and torques the error is split 
into real and space contributions 

(Af/p 3 M) 2 = (Af4 r 3 } M ) 2 + (AU^ M ) 2 . (B40) 

As in the case of forces and torques, the fast decay of the real-space interaction makes the 
MS contribution arising from the real-space negligible. Thus, the value of (^Up^) is the 



same than in Ewald calculations^ (see eq. (55)). 



The rms error of the reciprocal-part of the energy is by definition 

A£4?m) 2 : = ((U$L-U W ) 2 )> (B41) 
where U^ k ' is the exact reciprocal-space energy given by eq. (11). The energy correction 



term (37) is fully associated to the calculations in the reciprocal-space when m.i.c. is used. 



Thus, eq. (B41) can be rewritten in terms of (U^ corr ^ and the reciprocal-space error of the 



non-corrected energy as 



(AC41) := ((A^ + (C/(-)» 2 ) (B42) 
(AC + *U£L + (U^)) 2 ) . (B43) 
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applying that 



(A^ int ) = (B44) 
(A^LA^cU) = (B45) 



the rms error for the reciprocal contribution is 

2 / / „x \ 2 



(a^m) := ((AUi%) J + ((A^) ms + (U^)) 2 ) . (B46) 
If the relation ^AL^ms) « - (f/( corr )) is used, then 

(a^m) 2 := (W3nt) 2 ) + ((A^ ms ) 2 ) - «^» 2 , (B47) 



which shows that the correcting term (U( corr ^, in addition to removing the systematic bias 
in the reciprocal-energies, also reduces the fluctuating errors of the reciprocal-space self- 
energies by an amount — (^[/( corr ))) 2 . In the following sections (a, b, and c) it is shown 
that 

((At/£nt) 2 ) = 2M 4 C[^] (B48) 

(B49) 

and / (AU^m^j \ is given by 

'(A^? ms ) 2 ) = M 4 [(([/ ms (r,A))) 2 -2f/( e s x) (r,A) (U ms (r,fi)} + (U^(r,p,))' 

+ (X>*) [((U ms (r,fi)) 2 ) - ((U ms (r,fi))) 2 ] (B50) 

where the mean P 3 M MS energy of a unit dipole particle (U ms (r,fi)) is (40), the exact MS 
energy Ums\r,fi) is (35) and (jj( C011 ^ is given by eq. (37). 



On the other hand, in section c is shown that the mean square MS energy of a unit dipole 
in the P 3 M calculation is 



fcSM 3 fc'GM 3 

fc#o feVo 

EEE U{k tm )U(k' lm )} , (B51) 



tez 3 iez 3 mez 3 
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where 



f(a, b) 



a + b\ A + la - 61 



— a 



(B52) 



with k a = k + (27r//i)a , and A;^ = fc + (2ir/h)(cx + /3). Similar techniques to the ones 
used in the case of torques can reduce by several orders of magnitude the computational 
effort, rendering its exact calculation feasible, although for practical purposes to determine 



the rms energy error it is advisable to use the approach stated in Set. HI C 



a. Derivation of ( ( AU 



nc. int 



In this section, the mean square value of the pair energy of the non-corrected interactions 



AU 



(*) 



is derived. Using eq. (|B8|) the pair energy of a system of N particles can be written as 

(B53) 

i j k^i tyj 

applying 



N N N N 

E EE E ^ & ^ k m (&*( i '> k ) ■ £tatO*> 0) 

i j k^i tyj 



(£int(«, k) ■ $tot(j, /)) = (5ij 5 k ,i + 8i,i 5 kJ ) (£f nt (z, k)) 
the rms error can be written (using the approach (M 4 — ^ nf) ~ M 4 as 

2V N 



AU (k) - 



where (£? t (l,2)> = Qf nt [^] (see eq. pggb). 



2M 4 <^(1,2)> 



(B54) 



(B55) 
(B56) 



6. Derivation of ( ( AJJ^c n 



In this section, the mean square value of the MS energy of the non-corrected interactions 
is derived. For a system of ./V particles it can be expressed in terms of the non corrected 
P 3 M and exact MS energies of each particle, Unc]ms{i) and Ums\i) respectively, as 



N 



E WS-W - 



N 



(B57) 
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where the asterisk denotes complex conjugate, Ums\i) = nfUms'{r, ft) with Ums'{r,£i) 



. 2 J 7"( ex ) / 



r(ex) , 



given in (35), and 



(B58) 



notice that the surface energy terms have been dropped because they would be the same 
and would just cancel out. Some algebra, and a careful separation of the terms i ^ j from 



the i = j terms, leads to eq. (B50). 



c. Proof of ((L^(r,A) 



Taking the square of eq. (B14) and using the average given in (44) we get 



(U^S(r,P,)) 2 ) = -Lj /dr / dn,J2 E (^( fc )-A) 2 (B59) 
D{k')-^ U{k)G{k) U(k')G(k') | ^ #(fc m ) e" ifc " 



The integral in eq. (|B15|) and the angular integral 



/ dfl^ (a ■ fi) 2 (b ■ nY 



2tt/^ 4 



15 



(B60) 



where /(a, b) is given in eq. (B52), lead to 

\u^(r,d)) 2 ) = E f(D(k),D(k')) U(k)G(k) 



fcGK 3 fc'GK 3 
fe^O fc'^O 



U{k')G{k') [ U&rjUiK 

vroGZ 3 



(B61) 



Finally, taking into account that D{k) and G(fc) are periodic over the Brillouin cells, the 
final expression for the rms MS energy is eq. ( B51[ ). 
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APPENDIX C: CAPTION LIST, TABLES AND FIGURES 



TABLE I: Definitions of the various transforms between real-space and reciprocal 
space: Fourier transform of a non periodic function (first line); Fourier series of a 
periodic function (second line); and Finite Fourier transform of a mesh-based function 
(third line). 

TABLE II: Exact versus P 3 M Madelung-Self interactions. The mean and rms error of 
MS interactions are computed by taking an average over all positions and orientations 
of the dipole moment. 

FIGURE 1: The rms error AF of the forces (circles) for a system of 100 randomly 
distributed dipoles with Nm = 32 mesh points and real space cutoff r cut = 4. Box 
size L = 10. From top to bottom, the order of the charge assignment function, P, is 



increased from 1 to 7. The solid lines are the theoretical estimates (eq. (45)). In the 
inset, the order of the assignment function is P = 3, and the number of mesh points 
per direction is varied (from top to bottom): Nm £ {4, 8, 16, 32, 64}. 

FIGURE 2: Computational and theoretical rms error of the torques At for the same 
system as in figure [I] The dotted lines in the inset plot show two examples of the 
deviations observed at large values of the splitting parameter a when the errors due 



to MS torques are neglected in the evaluation of the rms error estimates (eq. (50)). 



FIGURE 3: Comparison of the theoretical estimates for the rms errors of the energy 



(eq. (54)) (solid line) with the corresponding numerical rms errors (circles). Several 
values of the charge assignment order P 6 [1,5] are depicted for systems with box 
length L = 10, number of dipoles N = 100, cutoff parameter r cut = 4, and mesh size 
Nm = 32. The numerical rms error of the energy is computed averaging over 100 



random conformations, using eq. (43). The inset shows a comparison between the 



rms error obtained using the energy correction [/( corr ) [eq. (37)] (solid lines), and the 
rms error when no energy correction is applied (dashed lines) for P = 3 and different 
mesh sizes Nm e {4, 8, 16, 32}. 

FIGURE 4: Similar comparison as in figure [3] for systems with different number of 
particles and dipole moments: (N = 1000, = 1), (N = 2000, \n\ = 5), and (N = 
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4000, \fi\ = 25). The box length is set to L = 21.54, assignment order P = 4, and 
mesh size Nm = 32. Dashed lines depict the rms errors when MS and energy correction 



terms are dropped out from expression (54), see eq. (56). 



FIGURE 5: Comparison of the theoretical rms estimates of forces and torques pre- 
dicted for random conformations versus the numerical rms errors for a typical con- 
formation in a simulation of a ferroffuid system^. Number of particles N = 1000, 
diameter a = 1.58, dipolar coupling parameter A = 3, and volume fraction <ft v = 0.3. 
The particles are under the influence of an external magnetic field along the z axis 
characterized by a Langevin parameter oll = 2.0. 

FIGURE 6: Time required to compute forces and torques as a function of the number 
of particles in the system using a typical desktop computer. The computing time t is 
given in seconds. Circles denote the optimal dipolar-Ewald method, and squares the 
new dipolar P 3 M method. In both cases, their respective parameters have been tuned 
to give maximum speed at fixed force-accuracy. The accuracy is set to AF = 10~ 4 . 
The density of particles is p = N/V = 0.1. Lines with slopes 1 and 3/2 are plotted 
to guide the eye. The inset plot shows the relative speed of dipolar-P 3 M method 
compared to the fastest dipolar Ewald-sum as a function of the number of particles in 
the system. 

FIGURE 7: Relative speed of the dipole-based model to the charge-based model as 
a function of the logarithm of the number of particles in the system (see details for 



the models in text, Set. VC) . t M and t q are the times needed by the dipole-based 
and the charge-based models respectively to integrate 20000 time steps. In all systems 
the number density is N/V = 0.1, and the algorithm parameters has been set for 
each system to the optimal values to yield maximum speed at fixed force accuracy 
AF = 10" 4 . 
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TABLE I: Table I 
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Exact Madelung-Self interaction 

P 3 M Madelung-Self interaction 
Average error 
Rms error 



Energy 



2a d 



3aAF 



2tt 
3L3 



eq. (B14) 



eq. (39) 



eq. (B50) 



Force 








Torque 



eq. pTTj ) 
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TABLE II: Table II 
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FIG. 1: Figure 1 
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FIG. 2: Figure 2 
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FIG. 4: Figure 4 
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FIG. 7: Figure 7 
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